Feat/local docker sandbox #172

mdear · 2025-12-29T15:12:04Z

Pull Request: Local Docker Sandbox Provider & Resource Management

Summary
This PR introduces a complete local Docker sandbox provider for air-gapped/local deployments, comprehensive resource management limits, and extensive test coverage (~343 new tests).

Key Features
🐳 Local Docker Sandbox Provider

New DockerSandbox provider enabling deployments without E2B cloud dependency
PortPoolManager for centralized port allocation (30000-30999)
expose_port(external) parameter: external=True returns localhost:port for browser access, external=False returns internal Docker IP for container-to-container communication
Orphan container cleanup loop (5min interval) removes containers without active sessions
/internal/sandboxes/{id}/has-active-session endpoint for session verification
port_manager.scan_existing_containers() recovers state on restart

💾 Local Storage Providers

LocalStorage providers for both ii_agent and ii_tool
Storage factory functions with local/GCS support
MCP tool image processing from sandbox containers

🔒 Resource Management

Browser MAX_TABS limit with automatic cleanup of oldest tabs
_on_page_change now enforces tab limits on externally-created pages (popups, target="_blank")
Shell MAX_SHELL_SESSIONS=10 limit with auto-close of oldest session when limit reached

🧠 LLM Enhancements

LLMConfig.get_max_output_tokens(): Model-specific output token limits (64K Claude 4, 100K o1, 16K GPT-4, 8K Gemini)
Anthropic native thinking blocks support via beta endpoint
Extended context (1M tokens) support for Claude models
Fix reasoning.effort parameter only sent to reasoning models

💬 Chat Improvements

File search filtering by user_id only for cross-session access
SHA-256 content hash deduplication in OpenAI vector store
Reduced file_search max results to 3 to prevent context overflow
File corpus discovery so AI knows which files are searchable
hasattr guard for text attribute on image-only messages

🎨 Frontend

Added selectIsStopped selector for proper stopped state UI handling
Fixed agent task state transitions for cancelled sessions
Improved subagent container with session awareness

Test Coverage
125 files changed, 16,573 insertions(+), 297 deletions(-)

New test files:
test_browser_tab_limit.py - Browser MAX_TABS enforcement
test_resource_limits.py - Browser and shell session limits
test_shell_tools.py - Shell session management
test_llm_config.py - LLM configuration
test_generation_config_factory.py - Image/video generation configs
test_openai_dalle.py / test_openai_sora.py - Media generation clients
test_local_storage.py / test_tool_local_storage.py - Storage providers
test_file_*.py - File operation tools
test_terminal_manager.py - Terminal session management

Code Quality
Added comprehensive documentation for architecture and design

- Add DockerSandbox provider for air-gapped/local deployments - Add PortPoolManager for centralized port allocation (30000-30999) - Add LocalStorage providers for ii_agent and ii_tool - Add MCP tool image processing from sandbox containers - Add storage factory functions with local/GCS support - Add test suite (143 tests passing) - Fix connect() to register ports preventing conflicts on reconnect - Fix delete() to cleanup orphaned volumes - Update docs with port management and local sandbox setup

Chat file handling: - Fix file_search filtering by user_id only (not session_id) for cross-session access - Add SHA-256 content hash deduplication in OpenAI vector store - Reduce file_search max results to 3 to prevent context overflow - Add file corpus discovery so AI knows which files are searchable - Fix reasoning.effort parameter only sent to reasoning models - Add hasattr guard for text attribute on image-only messages Sandbox management: - Add orphan cleanup loop (5min interval) to remove containers without active sessions - Add /internal/sandboxes/{id}/has-active-session endpoint for session verification - Add port_manager.scan_existing_containers() to recover state on restart - Add LOCAL_MODE config with orphan cleanup settings Resource limits: - Add MAX_TABS=20 limit in browser with force-close of oldest tabs - Add MAX_SHELL_SESSIONS=10 limit in shell tool Tests: Add 248 unit tests covering all changes

## New Features - expose_port(external) parameter: external=True returns localhost:port for browser access, external=False returns internal Docker IP for container-to-container communication - LLMConfig.get_max_output_tokens(): Model-specific output token limits (64K Claude 4, 100K o1, 16K GPT-4, 8K Gemini) - Browser MAX_TABS=20 limit with automatic cleanup of oldest tabs - Shell session MAX_SHELL_SESSIONS=15 limit with clear error messages - Anthropic native thinking blocks support via beta endpoint - Extended context (1M tokens) support for Claude models ## Frontend Improvements - Added selectIsStopped selector for proper stopped state UI handling - Fixed agent task state transitions for cancelled sessions - Improved subagent container with session awareness ## New Test Coverage (343 tests total) - tests/llm/test_llm_config.py: LLMConfig.get_max_output_tokens() tests - tests/tools/test_browser_tab_limit.py: Browser MAX_TABS enforcement - tests/tools/test_resource_limits.py: Browser and shell session limits - tests/tools/test_generation_config_factory.py: Image/video generation configs - tests/tools/test_openai_dalle.py: DALL-E 3 image generation client - tests/tools/test_openai_sora.py: Sora video generation client - tests/storage/test_local_storage.py: LocalStorage.get_permanent_url() - tests/storage/test_tool_local_storage.py: Tool server LocalStorage ## Code Quality - Removed debug print statements from anthropic.py - Removed trailing whitespace from all files - Fixed test assertions to match implementation behavior

…proposed changes.

mdear · 2025-12-29T15:14:14Z

Hi, dev team, loving this framework! I put it to the test by first making it run airgapped (save secure foundation providers such as Pinecone and Anthropic), plugged in my new mcp server (knowledge base for seating/mobility vertical, able to easily overwhelm any model's context unless carefully controlled/tuned).

Found a few stability issues and made some proposed fixes along the way.

All is open for constructive criticism, discussion and debate.

## Resource Management - Browser: Increase MAX_TABS from 20 to 50 - Browser: Enforce tab limit in _on_page_change handler for popups/target=_blank - Shell: Auto-close oldest session when MAX_SHELL_SESSIONS limit reached (replaces error-based rejection) ## Admin Tools - Add scripts/admin_credits.sh for user credit management - list: View all users and balances - show/topup/set/bonus: Manage individual user credits ## Documentation - Fix SANDBOX_DATABASE_URL to use asyncpg driver in .stack.env.local.example - Update feature-branch-analysis.md: Claude 3.5 → Claude 4.5, improve wording ## Tests - Update browser tests for MAX_TABS=50 - Add tests for _on_page_change tab limit enforcement - Update shell tests for auto-close behavior ## Build - uv.lock: Add prerelease-mode = "allow" option

…tension architecture docs - Refactor run_stack.sh to support both cloud (E2B/ngrok) and local-only (Docker sandbox) modes - Add start, stop, restart, status, logs, build, and setup commands - Implement auto-detection of local mode based on env file presence - Auto-create env files from templates with helpful setup instructions - Add --local flag for explicit local mode selection - Fix color output using $'...' syntax and printf for portability - Add VS Code extension architecture documentation with detailed design specs - Add executive summary document for VS Code extension integration

Converts HTML files (slides, pages, etc.) to a single multi-page PDF using Playwright/Chromium. Each HTML file becomes exactly one page with full content capture (no truncation). Features: - Automatic content height detection - Configurable viewport width and DPI - Supports directory input or specific files - Progress output with quiet mode option

…s, pages, etc.) to a single multi-page PDF\nusing Playwright/Chromium. Each HTML file becomes exactly one page\nwith full content capture (no truncation).\n\nFeatures:\n- Automatic content height detection\n- Configurable viewport width and DPI\n- Supports directory input or specific files\n- Progress output with quiet mode option

baalho · 2026-01-12T13:08:30Z

I really like the idea of having a sandbox that is local, for sensitive data or workflow. I am attempting to test this PR, thank you for your efforts

…ox modes - Add get_available_ports() method to SandboxInterface (defaults to None) - Implement get_available_ports() in sandbox adapter to return [3000, 5173, 8080] for Docker/local mode - Make RegisterPort tool description dynamic based on sandbox mode: - Local mode: instructs agent to use only ports 3000, 5173, or 8080 - Cloud mode: allows any port (original behavior) - Add unit tests for get_available_ports() in SandboxInterface - Fix test_resource_limits.py to match current implementation: - Update MAX_TABS expected value from 20 to 50 - Update shell session tests to expect auto-close behavior instead of rejection This fixes the issue where agents in local Docker sandbox mode would try to use ports like 8000 that aren't pre-mapped, causing HTTP 500 errors.

…rvice controls to run_stack.sh - Fix LocalStorage.get_upload_signed_url() to use internal_url_base (default) for server-to-server uploads, preventing hang when backend tries to upload via localhost URL inside Docker container - Add individual service start/stop/restart controls to run_stack.sh (e.g., ./scripts/run_stack.sh restart frontend --local)

…control wake Slides content processor: - Fix deadlock when LocalStorage upload URL points to backend itself - Use direct write for LocalStorage (avoid self-request) - Use async httpx with timeout for external storage (GCS/S3) Subagent interrupt handling: - Add SUB_AGENT_INTERRUPTED event type for proper UI state - Add emit_completion_event() helper to BaseAgentTool - Update all subagent tools to emit correct completion/interrupted events - Handle asyncio.CancelledError in CodexAgent Stack control: - Replace run_stack.sh with unified stack_control.sh - Add 'wake' command to restart stopped sandbox containers after reboot - Add 'recover' command to fix stuck sessions and restart backend - Auto-detect local vs cloud mode from running containers

mdear added 5 commits December 24, 2025 05:40

Added some unit tests and one fix to file_system/utils.py

af19ded

Added additional documentation to explain architecture and design of …

6e87a60

…proposed changes.

PhungVanDuy requested a review from khoangothe December 29, 2025 18:30

mdear added 4 commits December 29, 2025 18:21

mdear added 3 commits January 13, 2026 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/local docker sandbox #172

Feat/local docker sandbox #172

Uh oh!

mdear commented Dec 29, 2025

Uh oh!

mdear commented Dec 29, 2025

Uh oh!

baalho commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/local docker sandbox #172

Are you sure you want to change the base?

Feat/local docker sandbox #172

Uh oh!

Conversation

mdear commented Dec 29, 2025

Uh oh!

mdear commented Dec 29, 2025

Uh oh!

baalho commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants